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Human coronaviruses (HCV) are ubiquitous pathogens which cause respiratory, gastrointestinal, and possibly neuro- 
logical disorders. To better understand the moiecular biology of the prototype HCV-229E strain, the complete nucteo- 
tide sequence of the membrane protein {M) gene was determined from cloned cDNA. The open reading frame is pre- 
ceded by a consensus transcriptional initiation sequence UCUAAACU, identical to the one found upstream of the N 
gene. The M gene encodes a 225-amino acid polypeptide with a molecular weight (MW) of 25,822, slightly higher than- 
the apparent MW of 19,000-22,000 observed for the unprocessed M protein obtained after in vitro translation and 
immunoprecipitation. The M amino acid sequence presents a significant degree of homology (38%) with its counterpart 
of transmissible gastroenteritis coronavirus (TGEV). The M protein of HCV-229E is highly hydrophobic and its hydro- 
pathicity profile shows a transmembranous region composed of three major hydrophobic domains characteristic of a 
typical coronavirus M protein. About 10% (20 amino acids) of the HCV-229E M protein constitutes a hydrophilic and 
probably external portion. One N-glycosylation and three potential O-glycosylation sites are found in this exposed 


domain. © 1990 Academic Prass, Inc. 


Human coronaviruses (HCV) belong to either one of 
two antigenic groups, represented by the prototype 
strains 229E and OC43 (7). They are responsible for as 
much as 25% of common colds (2, 3) and have been 
associated with gastrointestinal disorders (4). Their 
possible involvement in neurological diseases was 
suggested by the observation of coronavirus-like parti- 
cles in the brain of one multiple sclerosis (MS) patient 
(6), the isolation of coronaviruses fram two MS brain 
tissues passaged in mice (6), and the detection of in- 
trathecal antibodies to HCV-OC43 and HCV-2239E in 
MS patients (7). However, the association of human 
coronaviruses with neurological diseases has not yet 
been confirmed. 

HCV-229E possesses a single-stranded, positive- 
sense RNA genome with a molecular weight of 5.8 
x 10° and a poly(A) tail of about 70 nucleotides at the 
3’ and (8}. As with other coronaviruses, six subgenomic 
RNAs are synthesized in infected cells (9). These ap- 
pear to have lower molecular weights than viral RNAs 
synthesized in cells infected with murine hepatitis virus 
(MHY). At least four polypeptides have been found in 
purified HCV-229E virions: 160- to 200-kDa and 88- to 
105-kDa glycoproteins which may be analogous to the 
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spike glycoprotein S (previously designated E2) of MHV 
(70); a 47- to 53-kDa polypeptide corresponding to the 
nucleocapsid protein N and a 17- to 26-kDa M protein 
(previously designated E1) observed in both glycosyl- 
ated and nonglycosylated forms (77-74). One author 
also reported glycoproteins of 31 and 65 kDa (77). 

The nucleotide sequence of the genes encoding the 
nucleocapsid proteins as well as the mRNA leader se- 
quences of HCY-229E and HCV-OC43 have recently 
been determined (75, 76). As a continuation of these 
studies, we report the nucleotide sequence of the gene 
encoding the membrane protein M of HCV-229E. Its 
predicted amino acid sequence is compared with se- 
quences determined for other coronaviruses. 

Clones containing the sequence of the M protein 
gene were obtained from a cDNA library constructed 
with mRNA isolated from HCV-229E-infected L132 
cells, and identified using a genome-specific probe 
(78). One clone, designated L8, was selected for se- 
quencing since it contained a large 3.6-kb insert aver- 
lapping by 1.2 kb the 5' end of the N protein gene. The 
remaining 2.4-kb fragment was excised from an inter- 
nal Psil site of clone L8 and subcloned into the pBlue- 
script Il vector (Stratagene). Unidirectional deletions of 
the 2.4-kb insert were created using exonuclease Ill, 
mung bean nuclease, and deoxythionucleotide deriva- 
tives (Stratagene). The sequencing of both strands was 
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5'- CTACTAGTGTGTATTACAATAATTAAACTAACTAAGCTTTGTTTCACTIGCCATATGTTTTGTACTAGAACAATT 
TATGGCCCCATTAAAAATGTGTACCACATTTACCAATCATATATGCACATAGACCCTTTCCCTAAACGAGTTATTGATC 


TCTAAACTAAACGACA ATG TCA AAT GAC AAT TGT ACG GGT GAC ATT GTC ACC CAT TTG AAG AAT 


M S N D N C T G©G DB I V T H L K N 
* e * * 


TGG AAT TIT GGT TGG AAT GTT ATT CTA ACC ATA TIC ATT GIT ATT CTT CAG TIT GGA CAC 
Ww oN F G W N V Yt LT IT F It V IL Q F G #F 


TAT AAA TAC TCC AGA TTG CIT TAT GGT TTG AAG ATG CTT GTA CTG TGG CTT CTT TGG CCA 
Y K ¥ $ R LL ¥ G LK M L VY &b Ww tb Lb WwW P 


CTC GTA CTT GCT TIG TCA ATC TTT GAC ACC TGG GCT AAT TGG GAT TCT AAT TGG GCC TTT 
L VL AL S$ IT F D T W A N W D S N W A F 


GIT GCA TTT AGC CTT CTT ATG GCC GTA TCA ACA CTC GTT ATG TGG GTG ATG TAC TTC GCA 
voeA F S L L M A V S$ T L VY M W V M ¥ F A 


AAC AGT TTC AGA CTT TIC CGA CGT GCT CGA ACT TIT TGG GCA TGG AAT CCT GAG GTC AAT 
N Ss F R L F R R A R T F W A W N P E V N 


GCA ATC ACT GTC ACA ACC GTG TTG GGA CAG ACA TAC TAT CAA CCC ATT CAA CAA GCT CCA 
A I T VT T VL GQdqT ¥ ¥ Q PT QaQa pep 


ACA GGC ATT ACT GTG ACC TTG CTG AGC GGC GTG CTT TAC GIT GAC GGA CAT AGA TTG GCT 
T G I T VY T LL & GCG VL ¥ VD GG H R LA 


TCA GGT GIT CAG GIT CAT AAC CTA CCT GAA TAC ATG ACA GIT GCC GTG CCG AGC ACT ACT 
§ G VQ ¥V H N L P E ¥ M T V A V P S$ T T 


ATA ATT TAT AGT AGA GTC GGA AGG TCC GTA AAT TCA CAA AAT AGC ACA GGC TCG GIT TTC 
I I ¥ &§ R V G RFR S$ VN S Q N S T G W V F 
e 


TAC GTA GGA GTA AAA CAC GGT GAT TTT TCT GCA GTG AGC TCT CCC ATG AGC AAC ATG ACA 
Y V R V K H G D F S A V S$ S$ P M S N M T 


GAA AAC GAA AGA TTG CTT CAT TTT TTC TAA ACTGAACGAAAAG ATG -3' 
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Fic. 1, Complete nucleotide sequence of the M protein gene of HCV-229E and its predicted amino acid sequence. The leader sequences are 
underlined and the potential V-glycosylation (@) and O-glycosylation (x) sites at the putatively external N-terminus of the polypeptide are also 
indicated. Two other N-glycosylation sites are found at the C-terminus of the protein. The nucleotide sequence from position 792 is from Ref. 


(15). 


performed by the plasmid sequencing technique (17), 
using T7 DNA polymerase. /n vitro translation of 
poly(A} mRNAs isolated from HCV-229E-infected 
L132 cells was carried out in order to determine the 
molecular mass of the unprocessed viral polypeptides. 

The complete nucleotide sequence of the M protein 
gene of HCV-229E and its predicted amino acid se- 
quence are presented in Fig. 1. The AUG codon is pre- 
ceded by the consensus intergenic sequence UCU- 


AAACU, which is identical to that upstream of the nu- 
cleocapsid protein-coding sequence (75; and Fig. 1). 
This sequence is conserved among coronaviruses of 
various species and represents the binding site of the 
leader RNA which mediates a discontinuous transcrip- 
tion of MRNAs (78). The longest open reading frame 
extends from base 171 through base 848 and encodes 
a 225-amino acid polypeptide with a calculated molec- 
ular weight of 25,822. The products of /n vitro transla- 
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—19 


—14 


Fic. 2. Immunoprecipitation of n vitro translation products from 
HCV-229E mRNAs. Poly(A)* mRNAs were translated in the presence 
of S]methionine, using a rabbit reticulacyte lysate (Promega Bio- 
tec). The viral polypeptides were immunoprecipitated and separated 
by SDS-PAGE (13% acrylamide). Lane 1, malecular mass standards; 
lane 2, mRNAs from HCV-229E-infected cells; lane 3, mRNAs from 
noninfected cells; lane 4, translation without exogenous mRNA. Mo- 
lecular rnass standards (kDa) are indicated on the left. The calculated 
molecular masses of relevant viral proteins (kDa) are indicated on the 
right. 


tion of poly(A)* mRNAs from HCV-229E-infected cells 
were precipitated with a polyclonal antiserum prepared 
against purified HCV-229E virions. As shown in Fig. 2, 
six viral polypeptides were observed, which migrated 
with apparent molecular masses of 98, 58, 42, 28.5, 
19, and 14 kDa, respectively. Although the identity of 
these proteins has not been firmly established, by com- 
paring with other coronaviruses, p98 probably corre- 
sponds to 8, p58 to N, and p19 to M. The nature of p42 
and p28.5 is not known at this time. Thus, the molecu- 
lar mass of M predicted from the nucleotide sequence 
is slightly higher than the molecular mass estimated by 
SDS-PAGE. Other studies have shown that the mature 


M protein has a molecular mass of 23- to 26-kDa (72- 
14) and that virions also incorporate a nonglycosylated 
20- to 22-kDa precursor of the M protein (72, 74). The 
latter observation is consistent with the identification 
of in vitro translated p19 as M. The lower apparent mo- 
lecular mass of M estimated by SDS-PAGE is consis- 
tent with the unusual electrophoretic behavior of this 
and other hydrophobic proteins, as was observed for 
MHV (79). 

Like TGEV (20), there are three amino acid se- 
quences characteristic of V-glycosylation sites in the 
predicted M protein sequence (Asn-5; Asn-190; and 
Asn-214), although only one (Asn-5) is found near the 
N-terminus, as compared to two for TGEV. Moreaver, 
three potential O-glycosylation sites are located in the 
putatively external N-terminus of the polypeptide (Ser- 
2; Thr-7; and Thr-12). In addition, there is only one cys- 
teine residue (Cys-6). Other coronavirus M proteins 
contain two (bovine coronavirus, BCV; Ref. (27), four 
(MHV-AS9 and JHM; Refs. {19) and (22), respectively), 
eight (TGEV; Ref. (20)), ar nine (infectious bronchitis vi- 
rus, IBV; Ref. (23)) cysteine residues. This cysteine resi- 
due is probably important in forming interchain disul- 
fide bridges, since M of HCV-229E has been shawn to 
form oligomers under nonreducing conditions (74). 

No significant nucleatide sequence homology exists 
between the M genes of HCV-229E and other corona- 
viruses. The highest M amino acid homology (38% or 
100 of 262 residues) occurs between HCV-229E and 
TGEV, which was reported to be antigenically related 
(24). Antigenically distinct BCV, MHV, and IBV show 
amino acid homologies of 32, 30, and 28%, respec- 
tively. In contrast, a homology of 87% was found be- 
tween the M proteins of BCV and MHV-A59 (27), which 
belong to another antigenic subgroup (24). On the 
other hand, a homology of 34% was found between 
the M protein of TGEV and BCV (25), which belong to 
two different antigenic subgroups. Figure 3 illustrates 
the M regions common to both HCV-229E and TGEV. 

As with other coronaviruses, the M protein of HCV- 
229E is a highly hydrophobic membrane protein. It con- 
tains 51% hydrophobic residues, compared to 45- 
51% for other coronaviruses (79-23, 25). The hydro- 
pathicity profiles of M proteins from HCV-229E, TGEV, 
BCV, MHVJIHM, MHV-A59, and IBV are presented in 
Fig. 4. The main features characterizing these M pro- 
teins include three large hydrophobic domains alter- 
nating with short hydrophilic regions. This suggests a 
selective pressure to maintain the potential transmem- 
branous domains of this coronavirus protein. As with 
other coronaviruses (26), only about 10% (20 amino 
acids) of the HCV-229E M protein constitutes the hy- 
drophilic putative external domain. On the other hand, 
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Fic. 3. Comparison of the predicted amino acid sequences of M proteins of HCV-229E (top row) and TGEV (bottom row) aligned for maximum 
homology. Regions common to both proteins are underlined. The analysis was performed on an Apple Macintosh Plus computer with the 
MacGene Plus program (Applied Genetic Technology Inc., Fairview Park, OH). 


HCV-229E 
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the large N-terminal putative signal sequence found 
only in the M protein of TGEV (20, 25) is not observed 
in HCV-229E, which is similar to the structure reported 
for BCV, MHV-JHM, MHV-A5S, and IBV. 

The coronavirus M protein is important for several 
reasons. This membrane protein is implicated in virus 
assembly and is believed to integrate the viral proteins 
prior to budding, most likely because of this protein's 
affinity for RNA (26). Moreover, some monoclonal anti- 
bodies against the M protein of MHV-JHM are protec- 
tive in vivo and thus may influence the outcome of dis- 
ease (27). We are currently pursuing the cloning and 
sequencing of other genes of HCV-229E, with empha- 
sis on the gene coding for the spike protein S, which is 
potentially important in viral pathogenicity. The avail- 
ability of molecular probes for human coronavirus 
genes opens new avenues for the verification of the 
potential involvement of these viruses in neurological 
diseases. 


Fic. 4. Hydropathicity profiles of M proteins from HCV-229E, 
TGEV, BCV, MHV-JHM, MHV-Ad9, and IBV determined according to 
Kyte and Doolittle (28}. The analysis was performed with the Mac- 
Gene Plus program as described in the legend to Fig. 3. Each point 
is the mean hydropathicity of a span of seven residues. Peaks ex- 
tending upwards correspond to hydrophobic regions and peaks ex- 
tending downwards to hydrophilic areas. 
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